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CNJ ' Abstract 

In the context of percolation in a regular tree, we study the size of the largest cluster and 
the length of the longest run starting within the first d generations. As d tends to infinity, we 
\^ [ prove almost sure and weak convergence results. 
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r^ ■ 
ti ■ 1 Introduction 

Fix a positive integer r and let T be the infinite r-ary tree, rooted at po- We consider a Bernoulli 
percolation on T. Formally, to each node f G T, we associate a random variable X^, where the 
, variables {Xy : v e T} are i.i.d. Bernoulh with F [X^ = 1) = 1 - F {Xy = 0) = p e (0, 1). For a 

QQ ■ subset ^ C T, let Xa = YI^^a ■^■"- We say that A is open if Xa = 1 

o' 
in 



1.1 The size of the largest cluster. 



C"^ I We use the term cluster to denote a connected component (i.e. subtree) of T when undirected. Let 



/C denote the set of clusters in T. For a node v £ T, let gen(f ) be its generation, i.e. the number 
of nodes in the shortest path from the root po to v, not counting po- Note that gen(po) = 0. 
Let Trf be the set of nodes with generation not exceeding d, namely T^ = {u G T : gen(u) < d}. 
For a cluster A £ IC, we let 1^41 denote its size (i.e. number of nodes) and p{A) its root, namely 
p{A) = argmin{gen(u) : v G A}. For d £ N, define K^ to be the size of the largest open cluster 



CZ ^ with root of generation not exceeding d: 

Kd = max{|A| : A G /C, p{A) G T^, Xa = 1}. 

In particular, Kq is the size of the largest open cluster containing the root po- 

In this paper we study the limit behavior of X^, as d — )• oo. In the context of the one-dimensional 
lattice Z, the corresponding results are often referred to as the Erdos-Renyi Law [8] and, in that 
context, our approach follows that of Arratia, Goldstein and Gordon [2]. In higher dimensions, 
the problem is much more intricate and many questions remain without answer. For a sample 
of sophisticated results, see e.g. [5, 16, 17]. The book by Grimmett [10] is a standard reference 
on percolation. For references more specific to trees, we refer the reader to a survey paper by 
Pemantle [15] and the book of Lyons and Peres [13]. Though the literature on percolation is vast, 
most of it focuses on the existence of an infinite cluster and its characteristics when it exists. On 
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the applications side, Patil and Taillie [14] identify regions of interest in a network by thresholding 
the response from each site in the network and computing connected components, which amounts 
to extracting the open clusters. One imagines that the largest cluster might receive the most 
attention. In particular, they mention monitoring water quality in a network of freshwater streams, 
where each stream may be modeled as a tree, though an irregular one. 

It is well-known that, in the supercritical setting where p > 1/r, the cluster at the origin has 
positive probability of being infinite, and, in fact, P {K^ = oo) tends to 1 as d increases. We restrict 
our attention to the subcritical and critical cases, i.e p < 1/r and p = 1/r respectively, where the 
cluster at the origin is finite with probability one. We start with the critical case, where we show 
that Kd behaves like the maximum of r independent random variables with distribution the total 
progeny of a Galton- Watson process with offspring distribution Bin(r, 1/r). Let log^ denote the 
logarithm in base r. 

Theorem 1. Assume p = 1/r. Then with probability one, 

log,. Kd , 

; )■ 2, a— ;>oo. 

a 

Moreover, 

2n-V2 



'(log^J^d< 2(i + x) ^^exp(-Cir-^/2)^ d^oo, Ci := 



V27rr(r - 1) 



For the subcritical case, we obtain similar results without transformation. Here, a Poisson ap- 
proximation applies showing that K^, behaves like the maximum of |Trf| = t-'^+i/^t- — i) independent 
random variables with distribution the total progeny of a Galton- Watson process with offspring 
distribution Bin(r,p). Define 

-=Mi-pr'(7^^- (1) 

Note that k < 1 for all p < 1/r. Let [x] denote the entire part of x G M. 
Theorem 2. Assume p < 1/r. Then with probability one, 

Kd 1 



d log,.(l/«:)' 



d — )• oo. 



Moreover, the sequence of random variables {Kd — fJ-d '■ d > 0) is tight, where fid '■= lo ^ (i/k) ■ -^'^ 
addition, a subsequence Kd — fid converges weakly if, and only if, a := \m\d^oo{f^d — [l^d\) exists, in 
which case the weak limit is [Z + a] — a, where 

¥{Z <z) =exp(-C2K^), 

for an explicit constant C2 > depending only on {p, r) 

The behavior of the size of the largest open cluster in the subcritical regime is therefore similar 
in the context of the regular tree and in the context of the one-dimensional lattice, the latter 
corresponding to the length of the longest perfect head run in a sequence of coin tosses [2, Ex. 3]. 



1.2 The length of the longest run. 

We use the term run for a path in T when directed away from the root po- Note that runs are 
special clusters. Let TZ denote the set of runs and define Rd to be the length of the longest open 
run with root of generation not exceeding d: 

Rd = max{|A| : A e TZ, p{A) G T^, Xa = !}■ 

Of course, runs and clusters coincide in the one-dimensional lattice Z. For a general reference 
on runs in dimension one, see [4]. Using the Chen-Stein method, Chen and Huo [6] proved results 
on the longest left-right run in a thin two-dimensional lattice of the form ([0, d] x [0, a]) n Z^, with 
the width a remaining constant. We also mention the work of Arias-Castro, Donoho and Huo [1] 
who used a statistic based on the longest run in a particular, non-planar graph to detect filaments 
in point-clouds. 

The results we obtain for runs are parallel to those we obtain for clusters. In the critical case, 
we show that Rd behaves like the maximum of r"^ independent random variables with distribution 
the height of a Galton- Watson process with offspring distribution Bin(r, l/r). 

Theorem 3. Assume p = l/r. Then with probability one, 

d 
Moreover, for any x G M, 



1, d — )• oo. 



'(log,.-Rd< d + x) ^exp(-C73r-^), d ^ oo, C3 := ^^ 



r — 1 



In the subcritical case, we show that Rd behaves like the maximum of \Td\ independent ran- 
dom variables with distribution the height of a Galton- Watson process with offspring distribution 
Bin(r,p). Again, a Poisson approximation applies. The constant that appears in the exponent is 
only defined implicitly. 

Theorem 4. Assume p < l/r. Then with probability one, 

Rd 1 



d log,(l/p)-l' 



as d — )• 00. 



Moreover, the sequence of random variables {Kd — i^d '■ d > 0) is tight, where Vd '■= t — ., , ■._, . In 
addition, a subsequence Kd — Vd converges weakly if, and only if, a := Ivoid^ooi^d — V'd\) exists, in 
which case the weak limit is [Z + a] — a, where 

F{Z<z)=exp{-Ci{rpy), 

for an explicit constant C4 > depending only on {p, r) 

1.3 Contents. 

The rest of the paper is devoted to proving our results. In Section 2 we prove Theorem 1 and 
Theorem 2. In Section 3 we prove Theorem 3 and Theorem 4. 



1.4 Additional Notation. 

Let dTd = {v £ T : gen(f) = d}. For a cluster A, let A denote the set of nodes not in A 
whose parents belong to A, and if p{A) ^ pQ, let A denote the parent of p{A). Also, define 
(1 — X)a = YlveA^^ ~ -^y)- ^^^ t^° sequences of real numbers (a^) and (6^), we use the notation 
o-n ~ bn to indicate that an/bn — > 1 and a^ x 6„ to indicate that the ratio an/bn is bounded away 
from zero and infinity, both understood as n — )• oo. Throughout the paper C denotes a finite, 
positive constant depending only on r and p, whose value may change with each appearance. 

2 The size of the largest open cluster 

In this section, we prove Theorem 1 and Theorem 2. We start with some notation. For a vertex 
V £T, let K{v) be the size of the largest open cluster with root v, 

K{v) = maxjl^l : A E /C, p{A) =v,Xa = 1}. 

In particular, 

Kd = raci^{K{v) : ^ G T^}. 

The distribution of K{v) does not depend on t> S T, and, in fact, given Xy = 1, coincides with 
that of the total progeny of a Galton- Watson tree starting with one individual and with offspring 
distribution Bin(r,p). Define 

i^^ = ¥{K{v) = n), ^n = ^{K{v)>n). 

Applying a well-known identity by Dwass [7] (called the Otter-Dwass formula in [13]), we get 

V'n = -P (6 H Vin = n-l) , where 6, • • • , Cn '~ ' Bin(r,_p) 

n 

P 
= — P (Bin(nr,p) = n — 1) 
n 



where 



Cat„p"(l-p)"(^-i)+i 

1 / nr \ 1 frn 



n\n — ij (r — l)n + 1 \ n 
is the nth generalized Catalan number [11], which among other interpretations, is the number of 
subtrees of T of size n rooted at the origin, i.e. 

Cat„ = \{AeJC: p{A) = po, \A\ = n}\. 

We could have obtained the expression for ipn using this definition of Cat„. Indeed, for n > 0, 
K{v) = n if, and only if, there is a (unique) subtree A with |A| = n , p{A) = v and A'yi(l — X)^ = 1, 
so that A cannot be extended and still be an open cluster. We then use the fact that a subtree 
of size n has exactly (r — l)n + 1 children. With the use of Stirling's formula, we arrive at the 
following conclusions; see also [3, 12]. 

Lemma 1. In the critical case p = 1/r, 

Vn 
In the subcritical case p < 1/r, 



^n+i _ 1 (l-p)rV^ 

*"~^5^^' ^^ - V2^(l - k) (r - 1)3/2 ■ 



2.1 Proof of Theorem 1 

Define 

K| := max{K{v) : v £ OT^}. 

We first prove that the conclusions of Theorem 1 hold for K^. For x G M, let nd{x) = [r'^'^'^^]. As 
KJ^ only involves independent random variables, we have 

P (log, kI <2d + x)=F [kI < naix)) = (1 - M/„,(,))^' = exp(-Ci r'^/^ + ©(r"^--)). 

Letting d — )• oo, we obtain the weak convergence, and by choosing x = ed, with e > —2 fixed, and 
applying the Borel-Cantelli Lemma, we obtain the almost sure convergence. 

It therefore suffices to show that Kd = (1 + op{l))K^. Clearly, Kd > K^, so we focus on the 
upper bound. Define 

Bd = {v£ dTd ■■ K{v) > r'^/d}, Bj = {v£ OT^ : K{v) > r^'^/d}. 

For any open cluster A with p{A) € T^;, we have 

1^1 = |^nTd„i|+ Yl K{v)<r'^ + r^'^/d+ ^ K{v). 
veAndTd veAnBa 

We turn to bounding the sum. We first show that, with probability tending to one, there is no 
open cluster A containing three or more nodes in Bd- Indeed, take t'i,i'2,'y3 S dTd distinct. Let w 
denote their most recent common ancestor and let k = d — gen{w). Either the paths Vj -^ po meet 
at w for the first time or two of the paths meet at a node u with gen(n) > gen(«;), in which case 
we let ^ = d — gen(u). Now, the nodes fi,f2,t'3 belong to the same open cluster if, and only if, 
the smallest subtree containing w and t'i,i'2,'y3 is open, and this subtree is of size i + 2k + 1, and 
therefore, the probability that they belong to the same open cluster is p^+2«+i, j^ addition, the 
number of such triplets is bounded by 



3JJ V V2yy V3 

The first factor comes from the fact that the three nodes are leaves of a subtree with root at 
generation d — k. Given that, the second factor comes from the fact that two of them belong to a 
subtree of that subtree with root at (relative) generation k — L Hence, remembering that p = 1/r 
and using Lemma 1, we have 

„ d k 

P(3^G/C :Xa = 1, l^nSrfl > 3) < C¥(K{v)>r'^/d\ .^^r'^+k+y+'ik+i 

< C{r^/dr^/'^r''>^d^/^r-''/\ 

By the same token, with probability tending to one (in fact of order at most d/r ), there is no open 
cluster A containing two or more nodes in B^. Now, when \A Ci Bd\ < 2 and \A n i?J| < 1, we have 

y K(v) < max K(v) + r^'^/d < KJ^ + r^'^/d. 

In the end, with probability tending to one, 

\A\ <r'^ + 2r^'^/d + K^, 
for any open cluster A with p{A) G T^. Hence, 

Kd<K^d+Op{T^''/d), 
and we conclude by the fact that K^ is of order exceeding r^'^ jd with probability tending to one. 



2.2 Proof of Theorem 2 

The proof of the ahnost sure convergence may be obtained following the arguments provided in 
Section 2.1 or using the bounds we are about to prove below. We omit details. 

The proof of the weak convergence is based on the Chen-Stein method for Poisson approximation 
as formulated by Arratia, Goldstein and Gordon [2]. Define 

y =/ ^a{1-X)a, p(^)=Po, 

^ \ Xa{1-X)^{1-X)a, p(A)/po; 

Also, let KLd^n be the set of clusters of size exceeding n with root in T^, and define 

Wd,n= Yl ^^- 

A£Kd,n 

By definition, ^^^ ^ ^^ ^ ^^^ ^^^^^^ ^^^^^ ^ ^^^^^ ^ ^^_ 

We approximate the law of Wd,n by the Poisson distribution with same mean X^.n = E {Wd^n)- We 
start by estimating Xd^n using Lemma 1, obtaining 



Ad,n = J^ P(yA = l) 



P(K(/)o) >n) + (l-p) Yl ¥{K{v)>n) 

veTd,'"¥'Po 
^„ + (l-p)(|Td|-l)^„. 



In particular, as n, d — )■ oo. 



Xd,n ~ C/2 r n ' K , 02 .— ; . 

r — 1 

For a cluster A S KLd^n^ define its neighborhood B{A) as the set of clusters B G KLd^n such that 

(BUBUB) n (iuAuA)/0. 
Define the following sums 

AGKd,n BeBiA) 
Gd,n = Y. Y. ^{yA = YB = l), 

Hd,n = Y ^(.\^(YA-E(YA)\YB,B^B{Am. 

A&Kd,n 

Then by the second part of [2, Th. 1], 

IIP {Wd,n = 0) - exp(-Ad,„)| < Fd,n + Gd,n + Hd,n. 

For X E M, define nd{x) = [fid + x]. When x is fixed and d — )• 00, Xd^nd{x) ^ ^i with 
^d,nd{x) -^ C2 ^['^+^1-''+^ when lid - [fid] -^a, x - [x] ^ 1 - a, 

6 



with 

F{[Z + a]-a<x)= exp(-C2 k^'^+^I-'^+I). 

Therefore, to conclude it suffices to prove that Fd^ntGd^n^Hii^n -^ when d, n — )■ oo in such a way 
that Xd^n ^ |Td|^n ^ 1- First, H^^n = by independence of Ya and Yb,B ^ i3{A). For Gd,n, the 
only pairs A,B£ JCd^n that contribute to the sum satisfy either B£AoiA€B_, and in both cases 

F{Ya = Yb = 1) = {l-pr^F{YA = l)FiYB = 1). 

Hence, using the fact that there are Cat^ subtrees of size m with a given root, each with (r — l)m+l 
children, and then Lemma 1, we have 

Gd,n < 2(l-p)-i|T,|^ Cat„p'-(l-p)(^-^)™+i((r-l>n + l).M/„ 

m>n \ m>n / 

For Fd^n, the only pairs A,B G /Cd,n that contribute to the sum satisfy either B G Au AU A or 
A G B U B U B_. The computations are then similar. 

3 The length of the longest open run 

The arguments are parallel to those provided in Section 2. For ^ C T, define its height as t{A) = 
sup{gen(i;) : v G A} — gen{p{A)). For a vertex v £ T, let R{v) be the length of the longest run with 
root V, 

R{v) = 1 + max{r(A) : A £ ]C, p{A) = v}. 

In particular, 

Rd = max{R{v) : v G Td}- 

The distribution of R{v) does not depend on f G T, and, in fact, given X^ = 1, coincides with that 
of the height (plus one), i.e. extinction time, of a Galton- Watson tree with offspring distribution 
Bin(r,p). Define 

(t>h = ^ {R{v) = h) , $/, = P {R{v) > h) . 

We have the following results on the asymptotic behavior of ^h [3]. 
Lemma 2. In the critical case p = 1/r, 

In the subcritical case p < 1/r, there is an implicit constant Cq > such that 

^h ~ Ce {rp)\ 

Let Catn^h denote the number of subtrees rooted at the origin, of size n and height h. See [9] 
for some results on Catn^h- As in Section 2, we can argue that 

<l>h = ^Cat„,,,p"(l -p)(^-i)"+i, implying <I>^ = J] ^ Cat„,,p"(l -p)(^-i)"+i. 

n>h i>h n>£ 



3.1 Proof of Theorem 3 

The proof is based on the foUowing observation 

Rd<Rd<Rd+ d, Rd ■■= max{R{v) : v £ dJd}, 

where the d term bounds the length of any run in T^^i. As R^ only involves independent random 
variables, 



R'i<h)={l- ^nY ■ (2) 

Choosing h = rl'^' ^1 and using Lemma 2, we obtain 

P (r^^ < rl'^/^A < exp(-Cr-'^/2)^ ^^ ^^^^ C> Q, 

so that, applying the Borel-Cantelli Lemma, R^ > r"^'^ eventually, with probability one. Hence, 
log^ Rd = {y + op{\)) log^ i?^, and it is therefore enough to prove the results for R^ in place of Rd- 
The almost sure convergence is obtained in a similar way by choosing h = r^'^+'^)'^ with e fixed, 
either positive or negative. For the weak convergence, fix x and let hd{x) = [r '^^]. By Lemma 2 
and (2), we have 

logj, R^ < d + x) —?■ exp(-C3 r~^), d—^oo. 



3.2 Proof of Theorem 4 

We again omit the details of the proof of the almost sure convergence and focus on proving the 
weak convergence. Let ICd^h denote the set of clusters with root in T^ and height exceeding h. We 
use the notation introduced in Section 2.2, with fCdh in place of K-dn- By definition. 





{Rd <h} = {W, 


Using Lemma 2, we obtain 




^d,h = 


= >; ip(ia=i) 




A<£!Cd,h 



djl 



0}. 



In particular, as h,d ^- oo, 



P {R{po) >h) + {l-p)^F {R{v) > h) 
^^ + {l-p){\Td\-l)^h. 



\ n ^d/^\h+i r' . ^6 (1 - v) 
^d,h ~ C4 r [rp) , C4 ■- 



p{r - 1) 

For X G M, define hd{x) = [vd + x\. When x is fixed and d — )• 00, we have Xd,ha{x) ^ 1) with 
>'d,haix) -^ C4(rp)[''+^]-''+\ when Ud - [ud] -^ a, x - [x] / 1 - a. 

It then suffices to show that Fd^h, Gd,h,Hd^h — ^ when d, /i — )• 00 in such a way that Xd^h ^ |Td|<I>/i x 
1, and the computations are parallel to those in Section 2.2. We focus on Gd,h- Fix p G (p, 1/t) 



and let ^h be defined as ^h, with p in place of p. For h large enough, we then have 

A&K.d,h B{A),B&A 
< C |T,| ^ Yl CaW P"i^ - p)(^-')"+'((r - l)n + 1) • $„ 

^ CA j;j;Cat„,,p-(l-p)('-i)"+i 

= C A !>/, X (rp)^ ^ 0, d^oo. 
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